POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing
159
݊× 3
Transform
݊× 3
݊× 64
Transform
݊× 64
݊× 1024
MaxPooling
1× 1024
output scores
Real-valued FC layer
Bi-FC layer
[+1, +1, ڮ , െ1]
[1.23,0.12, ڮ , െ0.66]
ݏ݅݃݊(ڄ)
0.14
ڮ
െ1.02
ڭ
ڰ
ڭ
െ0.54
ڮ
1.75
+1
ڮ
െ1
ڭ
ڰ
ڭ
െ1
ڮ
+1
ٖ
܉ିଵ
܊܉షభ
ܟ
܊ܟ
[9,7, ڮ , െ5]
[0.12,0.41, ڮ , 0.32]
ל
ߙ
܉
[1.08,2.87, ڮ , െ1.60]
EM+STE
STE
ݏ݅݃݊(ڄ)
FIGURE 6.4
Outline of the 1-bit PointNet obtained by our POEM on the classification task. We save
the first and last fully connected layer as real valued, which is with horizontal stripes. We
give the detailed forward and back propagation process of POEM, where EM denotes the
Expectation-Maximization algorithm, and STE denotes Straight-Through-Estimator.
where we set a1 = −1 and a2 = +1. Then PR→B(·) is equivalent to the sign function, i.e.,
sign(·).
However, The binarization procedure achieved by PR→B(x) is sensitive to disturbance
when x follows a Gaussian distribution, e.g., XNOR-Net. That is, the binarization results
are subjected to the noise of the raw point cloud data, as shown in Fig. 6.3. To address this
issue, we first define an objective as
arg min
x
PR→B(x) −PR→B(x + γ),
(6.37)
where γ denotes a disturbance.
Another objective is defined to minimize the geometry distance between x and PR→B(x)
as
arg min
x,α
∥x −αPR→B(x)∥2
2,
(6.38)
where α is an auxiliary scale factor. In recent works of binarized neural networks (BNNs)
[199, 159], they explicitly solve the objective as
α =
∥x∥1
size(x),
(6.39)
where size(x) denotes the number of elements in x. However, this objective neglects that α
also influences the output of the 1-bit layer. In contrast, we also consider this shortcoming
and modify this learning object for our POEM.
6.3.2
Binarization Framework of POEM
We briefly introduce the framework based on our POEM, as shown in Fig. 6.4. We extend
the binarization process from 2D convolution (XNOR-Net) to fully connected layers (FCs)
for feature extraction, termed 1-bit fully connected (Bi-FC) layers, based on extremely
efficient bit-wise operations (XNOR and Bit-count) via the lightweight binary weight and
activation.